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> . 1 INTRODUCTION 



ABSTRACT 

We present the first high-resolution images of CSWA 31, a gravitational lens system 
observed as part of the SLUGS (Sloan Lenses Unravelled by Gemini Studies) pro- 
gram. These systems exhibit complex image structure with the potential to strongly 
constrain the mass distribution of the massive lens galaxies, as well as the complex 
morphology of the sources. In this paper, we describe the strategy used to reconstruct 
the unlcnscd source profile and the lens galaxy mass profiles. We introduce a prior 
distribution over multi-wavelength sources that is realistic as a representation of our 
knowledge about the surface brightness profiles of galaxies and groups of galaxies. To 
carry out the inference computationally, we use Diffusive Nested Sampling, an effi- 
cient variant of Nested Sampling that uses Markov Chain Monte Carlo (MCMC) to 
sample the complex posterior distributions and compute the normalising constant. We 
demonstrate the efficacy of this approach with the reconstruction of the group-group 
gravitational lens system CSWA 31, finding the source to be composed of five merging 
spiral galaxies magnified by a factor of 13. 
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Gravitational lensing has revealed itself to be a powerful 
probe of not only the distribution of luminous and dark mat- 
ter within the lensing galaxies, but also the small scale struc- 
ture apparent in sources when the images are reconstructed. 
Initially, with point-like quasar sources, only limited infor- 
mation was available to constrain reconstructions, but the 
identification of extended images has led to the development 
of a number of techniques to maximize the use of additional 
information in determining both the lens and source prop- 
erties (for a recent review of strong lensing by galaxies, see 
lTreull2010h . 

The development of large scale surveys has allowed for 
the automatic search and identification of large numbers of 
gravitational lens systems. A prominent example is SLACS 
(Sloan Lens ACS Survey), which uses HST/ACS to follow- 
up lens candidates identified in the Sloan Digital Sky Survey 
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(SDSS) (|Bolton et all 120081 ). SLACS targets small separa- 
tion lenses, in which the images are unresolved by SDSS, 
typically through the spectroscopic identification of emis- 
sion lines from high redshift objects within the spectra of 
lower redshift early-type galaxies. Thus, SLACS has been 
highly successful in using lensing to study the properties of 
typical elliptical galaxies (|Treu et alj|2009h . 

Alternatively, the Cambridge And Slo an Surey of Wide 
ARcs in the skY (CASSOWARY) survey |Belokurov et all 
|2009| . http: //www . ast . cam. ac .uk/research/ cassowary/), 
explores a different regime of galaxy-galaxy lens systems by 
searching for multiple blue faint companions around mas- 
sive luminous red galaxies in SDSS. Thus, CASSOWARY 
is suited to finding wide separation lenses with high mag- 
nification. So far, CASSOWARY has identified 40 systems, 
with several more unconfirmed candidates. These lenses are 
suited to utilising the unique properties of gravitational lens- 
ing to provide information about the mass profiles of massive 
galaxies and galaxy groups, and to provide very high resolu- 
tion images of faint hi gh-redshift galax i es. Pr evious studies 
of group lenses include [Limousin et all (|2009l h 
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In this paper we present the first analysis of high reso- 
lution Gemini/GMOS images of a confirmed CASSOWARY 
lens system, taken as part of the SLUGS (Sloan Lenses Un- 
raveled by Gemini Studies) program. The focus of this paper 
is the reconstruction technique, with the details of the ex- 
tension to other CASSOWARY lenses being presented else- 
where (Belokurov et al in prep.) In Section [2] we describe 
the approach to gravitational lens modelling that will be 
adopted for the SLUGS systems, discussing the prior source 
information in Section f2.ll the adopted lens models in Sec- 
tion [2]2] and the implementation of nested sampling in Sec- 
tion 2] The second part of the paper, Section [5] we will 
present the implementation of this scheme on CSWA 31; a 
newly confirmed gravitational lens and part of the SLUGS 
sample. The conclusions of this study are presented in Sec- 
tion [6] 



2 GRAVITATIONAL LENS 
RECONSTRUCTION 

Grav itational lens reconstru ction is a classic inference prob- 
lem (jBrewer fc Lewis! l2006h : given some data D (i.e. image 
pixel values) and prior information (and/or assumptions), 
we wish to infer various properties of the lens system such 
as its source parameters S, the lens parameters L, and var- 
ious other parameters, denoted collectively by $; for exam- 
ple, $ may include parameters describing the light profile 
of the lens galaxy, and unknown noise levels or systematic 
effects. By Bayes' rule, our knowledge of these parameters, 
taking into account the data, is described by the posterior 
probability distribution 

p(S, L, *\D) cx p(S, L, *)p(D\S, L, *) (1) 
= p(S)p(L\S)p($\L, S)p(D\S, L, *) (2) 

Different lens modellin g techniques can all b e interpreted 
within this framework (Brewer fc Lewis! l2006h . Differences 
between them are mostly due to the effective choices for the 
probability distributions p(S, L,&), describing prior knowl- 
edge about the system, and p(D\S, L, $), describing prior 
knowledge about how the observed data are related to the 
system. The other area where lens modelling techniques dif- 
fer is the strategy for summarising the posterior distribu- 
tion: typically one either searches for the peak to find a 
best-fitting model, or else one tries to obtain a sample of 
typical models from the posterior distribution. The latter 
approach is more computationally expensive but results in a 
more honest summary of the uncertainty in the conclusions, 
as the posterior probability of any hypothesis can be easily 
calculated. Throughout this paper, we will use a sampling- 
based approach, prioritising realism over convenience. 

2.1 A Realistic Prior on Sources 

To begin the inference, we must construct a prior probabil- 
ity distribution to describe what kinds of sources are consid- 
ered plausible. Ideally, the prior distribution should contain 
all of the relevant information that we have about typical 
light profiles of galaxies, but should otherwise make minimal 
assumptions about features of the light profile that we are 
actually uncertain about. We will now list features that an 



ideal prior distribution for a galaxy source should have. Var- 
ious priors have been proposed and used in the literature, 
usually satisfying some of these criteria but not others. In 
Section [2. 1.1 1 we develop a prior that has all of these prop- 
erties. This is not to claim that our prior is optimal, or the 
best one could do, merely that it avoids some of the most 
obvious flaws. 

• The surface brightness should be non-negative every- 
where, with 100% prior probability. This rules out the use 
of "linear regularisation" , i.e. pixellated intensity maps 
with a joint Gaussian prior for the intensity values (e.g. 
ISuvu et al.ll2006l ; iDve et al.ll200Sl) . 

• The surface brightness may be close to zero ove r 
a large fraction of the domain l|Brewer fc Lewis! 120061 ) . 
This rules out pixellated intensity maps with con- 
strained uniform prior, and the "maximu m entropy" prior 
jWallington, Kochanek. fc Naravanlll996h . 

• Surface brightness at one point should be correlated 
with surface brightness at another nearby point. This 
correlation should decrease with distance. 

• The degree of spatial correlation may vary across 
the domain: i.e. we should not rule out the possibility 
of smoothness in one region and small scale structure in 
another. 

• The prior should be convenient to explore via Markov 
Chain Monte Carlo (MCMC). 

• There is probably some degree of correlation between 
the surface brightness profile in different wavebands, al- 
though this is not guaranteed. 

The statistics literature contains many proposals for 
general prior distributions for density functions (e. g. the 
Dirichlet distribution), which may be adapted for our pur- 
pose; however, the context here is different: our densities 
will be surface brightness profiles and not probability densi- 
ties for data points. Thus, our concern is not so much with 
the mathematical properties of the priors but with which 
ones produce density profiles that resemble galaxies. We at- 
tempt to satisfy all of the design criteria listed above with 
our blobby prior described in the next section. 



2.1.1 Blobby Prior 

Our models of surface brightness profiles will ultimately be 
constructed from some basis functions, also known collo- 
quially as "blobs". Specifically, we choose elliptical Sersic 
profiles as our basis functions, since in some cases even a 
single Sersic profile is sufficient to describe the light profile 
of a galaxy. This approach of using moder ate numbers of 
physically mo tivated profiles is not new (e.g. lMarshaiill2006l : 
ISkillinall998l ). but is a simple way of implementing a prior 
with the desired properties outlined above. 

In a suitably centred and aligned Cartesian coordinate 
system, a blob with scale radius Ro and axis ratio q has the 
following profile: 
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f(x,y) = Cexp 



(3) 



where R — \f x 2 jq + qy 2 . When a — 2, the blob is a Gaus- 
sian; a = 1 corresponds to an exponential profile, and 
a = 1/4 is the de Vaucouleurs profile. We restrict a to be 
between 0.1 and 10. We also allow the number of blobs, 
numBlobs, to be a free parameter. 

Since we will be analysing multi-band images, we must 
also assign a prior probability distribution to the colours of 
the blobs. To accomplish this, each blob has a set of param- 
eters called unnormalisedSpectrum, describing how bright 
the blob is in each band relative to the other bands. The 
blobs can all have similar colours, or the colours can vary 
immensely. The degree of diversity of the spectra is con- 
trolled by a hyperparameter called sigSpectrum. The hyper- 
parameter meanSpectrum describes the typical colour that 
the blobs are centred around. 

The prior for the central positions of the blobs was cho- 
sen to be a an independent Gaussian distribution for each 
blob, with a common centre position and standard devia- 
tion, as hyperparameters. This allows the blobs to be ei- 
ther close to concentric, or not, as required by the data. If 
the blobs are almost concentric, then sums of several blobs 
can accurately model most decreasing radial profiles. Multi- 
component sources can also be modelled, they correspond to 
the case where the blobs are separated by more than about 
one scale radius. We have experimented with more sophisti- 
cated schemes to correlate the central positions of the blobs 
l |NeaJl2003h but found that the gain in realism was minor, 
yet the computational complexity of the problem increased 
significantly. 

The typical peak surface brightness of blobs was as- 
signed a prior that spans several orders of magnitude, cen- 
tred around unity. Thus, the reconstruction software expects 
images in units where the fluxes are of order close to unity. 
The actual peak surface brightness of each blob has a log- 
normal prior, centered around the typical value. 

For a thorough description of all of the parameters and 
the prior distributions that define our blobby prior, see Ta- 
ble [TJ Figure \T\ shows sample galaxy light profiles drawn 
from the prior. We recommend, as a general practice, gen- 
erating random models from the prior as a visual check on 
the reasonableness of the prior probability assignments. 



2.2 Lens Models 

In cases where the light profile of the lens galaxy appears 
simple, we assume that the mass profile is also simple and 
model the projected mass profile of the lens galaxy as a non- 
singular isothermal ellipsoid (NIE) plus external shear (7). 
This model (and related models, e.g. singular isothermal el- 
lipsoid without external shear) has been used very success- 
fully to model the mas s profiles of elliptical galaxy lenses 
(e.g. iBolton et al.l l2008f ); even when more general models 
have been used, the mass profiles are inferred to be close 
to isothermal. The model has eight parameters: b, the "Ein- 
stein radius", q, the axis ratio of the major and minor axes 
of the elliptical mass distribution, (x c ,y c ), the central posi- 
tion of the lens, 8, the orientation angle of the lens, 7, the 
strength of the external shear, # 7 , the orientation angle of 
the external shear, and r c , the core radius (often set to zero, 



giving a singular isothermal ellipsoid). A simple interpreta- 
tion of b is the length of the minor axis of the critical curve, 
in the singular case r c = 0. 

The source plane position (x s ,y 3 ) and the image plane 
position (x, y) of a ray are related by: 



x s — x — a x (x, y) 
Vs=y~ OL y (x,y) 



(4) 



where the deflec tion an gles a are given by 
iKeeton fc Kochanekl (|l998l ). Defining V = 
\J q 2 (r 2 + x 2 ) + y 2 , the deflection angle formulae are: 



a x {x,y) 



a y {x,y) = 



: tan 



: tanh 1 



tf> + r a 

vV 1 ~ 1 



ip + q r c 



(5) 
(6) 



as long as q < 1. If q > 1, then q can simply be replaced by 
q^ 1 and the orientation angle 8 rotated by 90 degrees, with 
the same result. We do not allow q = 1 but it can become 
arbitrarily close to 1 and therefore close-to-spherical lenses 
are not ruled out. By making the prior for the lens strength 
b depend on the image size, we are able to avoid the need 
for system-specific tuning of the priors: we simply assume 
that the input images are such that the Einstein Radius b 
of the lens is of the same scale as the input image. 



2.3 Complex Lenses 

Since the SLUGS sample contains very massive lenses, many 
of the lenses are not smooth galaxies but are instead galaxy 
groups. For modelling these systems, the NIE family is not 
flexible enough to represent the complex structure of the 
lens. To overcome this, it is possible to use the blobby prior 
of Section 12.1.11 but interpret each blob as a nonsingular 
isothermal ellipsoid (with zero external shear) rather than 
a Sersic profile. This is straightforward as there is a simple 
correspondence between the parameters of the Sersic pro- 
file and the parameters of the NIE. However, this creates 
computational inefficiencies, as the code needs to infer the 
positions and orientations of many lens components, and the 
likelihood function tends to be more difficult when modify- 
ing the lens as opposed to the source. We note that gravita- 
tional interactions within groups will tend to make the NIE 
approximation for each galaxy less valid, however sums of 
NIEs remain a tractable and flexible family of models. 

For complex lenses in the SLUGS sample, we use the fol- 
lowing strategy. First, we visually identify the approximate 
positions, ellipticities and orientations of the dominant lens 
components. The prior for these parameters can then be set 
to be narrow Gaussians centred on the estimated values. 
The Einstein radii {6} of these lenses can then be assigned 
a broad prior, such as independent cx 1/6 priors within a 
generous range. This prior suggests that from the images, 
we know where the lens substructures are, but we do not 
know their relative masses, because the mass to light ratio 
might differ between objects. 
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Figure 1. A sample of random light profiles (source galaxies generated from the prior distribution defined in Section 12.1.11 Simple 
and complex profiles are possible, as are a variety of sizes and colours. The peak surface brightness is also uncertain by several orders 
of magnitude. W hile this prior clearly is not an optimal assignment, it is a more realistic assignment than regularisation formulae 
jSuvu et al.ll200d) . for which the corresponding images would resemble noise (for first-order regularisation) or blurred noise (for gradient 
or curvature regularisation). 



3 SAMPLING DISTRIBUTIONS 



each pixel: 



The prior distributions for the unknown parameters have 
now all been defined. However, we must also assign sampling 
distributions p(S, L, 5>|-D) for all D and (S, L, $) (these are 
known as sampling distributions when D is unknown, when 
D is fixed at the observed data, p(S, L,<&\D) as a function 
of the parameters is called the likelihood function). For any 
value of the source and lens parameters (S, L) we can com- 
pute mock images M(S, L) by ray tracing through the lens 
(firing 4 rays per image pixel, a compromise value chosen for 
efficiency) and blurring by the PSF (assumed known from 
nearby stars in the field). The likelihood is then related to 
the x 2 deviation between the mock image and the actual 
image; i.e. we assume a sampling distribution for the noise 
that is normal/Gaussian with mean 0, and independent for 



P(D\S,L,Q) 



exp 



(7) 



where n is the total number of pixels, and the standard 
deviation for each pixel is given by: 



+ (t^M{S,L)^' 



(8) 



Here, o" map is the uncertainty reported from the weight map, 
and the extra parameters <7base and r allow for inaccuracies 
in the weight map, by possibly boosting the noise level by a 
constant value, or a value proportional to the square root of 
the predicted surface brightness, o^asc and r were assigned 
J(10 -3 , 10 3 ) priors. Once again, this assumes that the input 
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Table 1. All of the parameters of a multi- wavelength blobby light profile, and their prior distributions. If the prior for a parameter refers 
to a hyperparameter, then this defines the prior conditional on that hyperparameter. The priors for the hyperparameters themselves 
are included in the second half of the table. All priors are independent except where otherwise specified. J(a, ft) denotes Jeffreys' scale- 
invariant prior p(x) oc 1/x between lower and upper bounds of a and b respectively. Lognormal(a, ft 2 ) denotes a lognormal distribution, 
with a being the central value for the variable, and ft being the standard deviation for the log of the variable. Note that a may be named 
mean*, even though it is not technically the mean of the distribution. 



Parameter 


Definition 


Prior 


Individual Blob Parameters 


peak 


Peak surface brightness, C in Eqn|3] 


Lognormal(meanPeak, stdevLogPeak 2 ) 


width 


Scale width (Ro in Eqn(3j 


Exponential(meanWidth) 


xc, yc 


Central position 


Normal((X c , Y c ), spread 2 ) 


q 


Axis ratio 


Uniform(0.1,l) 


theta 


angle of Orientation 


Uniform(0, 2tt) 


slope 


Slope of light profile, a in Eqn|3] 


Lognormal(meanAlpha, sigAlpha 2 ) 


unnormalisedSpectrum 


Share of bolometric flux at each wavelength 


Lognormal(meanSpectrum, sigSpectrum 2 ) 


Hyperparameters 


numBlobs 


Number of blobs 


Uniform({0, 1,2, ...,10}) 


(X C ,Y C ) 


Typical position that blobs are centred around 


Uniform over source domain 


spread 


Standard deviation of blob positions 


J(10 -3 ,l)x (image width) 


meanPeak 


Typical brightness of a blob 


J(10" 3 ,10 3 ) 


stdevLogPeak 


Diversity of brightness of blobs 


J(0.03, 2) 


sigSpectrum 


Diversity of spectra 


J(0.01,3) 


meanWidth 


Typical scale width of a blob 


J(l(r 3 ,l)x (image width) 


meanSpectrum 


Typical spectrum that blob spectra are centred around 


Lognormal(0, 3 2 ) 


meanAlpha 


Typical slope 


■7(0.1, 10) 


sigAlpha 


Diversity of slopes 


J(0.01,2) 



images are in units where the typical pixel values are of order 
unity. 

Although it is not presented in this paper, we also 
recommend generating simulated data sets from the sam- 
pling distribution, using models drawn from the prior. This 
practice gives visual checks on the joint prior, p(8, D) = 
p(8)p(D\6) which inference depends on. 

The computational task of using images to infer appro- 
priate values for all of the above parameters, with uncer- 
tainties, is rather difficult. To accomplish this goal, we use a 
variant of Nested Sampling (ISkillinell2006h . designed to cope 
effectively with multimodal and highly correlated posterior 
distributions. This algorithm is described briefly in Sec- 
tion \$ \ For a full description, see lBrewer. Partav. fc Csanvil 
(1201011 . 



4 NESTED SAMPLING IMPLEMENTATION 

Nested Sampling (NS) is a powerful and widely ap plicable 
algorithm for Bayesian computation (|Skillind 12009) . Start- 
ing with a population of particles {#;} drawn from the prior 
distribution ir(9), the worst particle (lowest likelihood L(0)) 
is recorded and then replaced with a new sample from the 
prior distribution, subject to the constraint that its likeli- 
hood must be higher than that of the point it is replacing. 
As this process is repeated, the population of points moves 
progressively higher in likelihood. 

Each time the worst particle is recorded, it is assigned 
a value X £ [0,1], which represents the amount of prior 
mass estimated to lie at a higher likelihood than that of 



the discarded point. Assigning X- values to points creates a 
mapping from the parameter space to [0, 1], where the prior 
becomes a uniform distribution over [0, 1] and the likelihood 
function is a decreasing function of X. Then, the evidence 
can be computed by simple numerical integration, and pos- 
terior weights can be assigned by assigning a width to each 
point, such that the posterior mass associated with the point 
is proportional to width x likelihood. 

The key challenge in implementing Nested Sampling for 
real problems is to be able to generate the new particle from 
the prior, subject to the hard likelihood constraint. If the 
discarded point has likelihood L* , the newly generated point 
should be sampled from the constrained distribution: 



tt(0) f 1, L{6) > L* 



X* 



0, otherwise. 



(9) 



where X* is the normalising constant. Technically, our 
knowledge of this new point should be independent of all 
of the surviving point s. A simple way to ge nerate such a 
point is suggested by ISivia fc Skillind ( 2006): copy one of 
the surviving points and evolve it via MCMC with respect 
to the prior distribution, rejecting proposals that would 
take the likelihood below the current cutoff value L* . This 
evolves the particle with Equation [9] as the target distri- 
bution. If the MCMC is done for long enough, the new 
point will be effectively independent of the surviving popula- 
tion and will be distributed according to Equation [9] How- 
ever, in complex problems, this approach can easily fail - 
constrained distributions can often be very difficult to effi- 
ciently explore via MCMC, particularly if the posterior dis- 
tribution is multimodal or highly correlated. To overcome 
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these drawbacks, sophisticated s chemes such as MultiNest 
l|Feroz, Hobson. fc Bridges] |200S| ) have been developed and 
applied successfully in low-dimensional problems. However, 
for our purposes we require a method that uses MCMC. 

The main advantage of Nested Sampling is that succes- 
sive constrained distributions (i.e. Pl*(9),Pl* +1 (@), and so 
on) are, by construction, all compressed by the same factor 
relative to their predecessor. This is not the case with tem- 
pered distributions of the form pt(6) oc it {6)L{9) 1,T , where 
a small change in temperature T can correspond to a small 
or a large compression. Tempering based algorithms (e.g. 
simulated annealing, parallel tempering) will fail unless the 
density of temperature levels is adjusted according to the 
specific heat, which becomes difficult at a first-order phase 
transition. Unfortunately, knowing the appropriate values 
for the temperature levels is equivalent to having already 
solved the problem! Nested Sampling does not suffer from 
this issue because it asks the question "what should the next 
distribution be, such that it will be compressed by the de- 
sired factor" , rather than the tempering question "the next 
distribution is pre-defined, how compressed is it relative to 
the current distribution?" 

4.1 Diffusive Nested Sampling 

Throughout this pa per we use the efficient Diffusive Nested 
Sampling method (jBrewer. Partav. fc Csanvll |2010T ) which 
has been found to speed up MCMC-based Nested Sampling 
by an order of magnitude. The basic idea is that, instead 
of exploring a single constrained distribution, the particle 
explores a mixture of the required constrained distribution, 
and the past constrained distributions (See Figure [5]). New 
likelihood contours are created using likelihoods accumu- 
lated during the run, which provides more information than 
using only the likelihood value of the endpoint of an MCMC 
run. Finally, the absence of the copying operation prevents 
the depletion problem that can occur in the classic imple- 
mentation of NS, where repeated copying destroys the di- 
versity of t he population of particles premat urely. For more 
details, see lBrewer. Partav. fc Csanvil l|201(t ). 

4.2 Nested Sampling Solves Multiple Problems at 
Once 

Nested Sampling algorithms provide samples from the pa- 
rameter space, along with an assignment of prior mass 
(width in the X-space) to each sample. These samples can be 
used to probe (generate samples and estimate the normalis- 
ing constant) target probability distributions other than the 
posterior. For example, a common family of distributions of 
interest are the tempered posterior distributions: 

Pt{6) = ^y^(W) 1/T (10) 

These distributions can be studied by weighting the particles 
according to (prior mass ) XL 1 ' 7 , for various T rather than 
just T = 1. 

Of course, this is only possible if the sample contains 
points from regions of parameter space that are significant 
with respect to the modified target distribution. 

This technique can be used to solve the inference prob- 
lem under different prior probability assignments at very 



little computational cost. Particularly, we can explore the 
effect of different sampling distributions (Section[3]). For ex- 
ample, suppose we hypothesise a correlated probability dis- 
tribution for the noise. This means that larger, correlated 
deviations between the model and the data can be ascribed 
to noise, rather than model inadequacy. Thus, samples with 
lower X-values will be deemed acceptable and included in 
the sample. The effect of correlated noise models is very 
similar to the effect of simply raising the temperature, an 
operation that is computationally trivial. 

Note that not all systematic errors can be accounted for 
in this way. For example, imagine that the sky background 
had been subtracted incorrectly, leaving a gradient across 
the image. This would be modelled using light in the source 
plane. If we attempted to reweight the sample by using the 
marginal likelihood after integrating over a prior distribu- 
tion for the incorrect sky subtraction, we would find that 
one point dominated the sample. This immediately provides 
evidence of failure: essentially, Nested Sampling would have 
scaled the wrong peak in parameter space, and did not ob- 
tain any significant samples with respect to the modified 
posterior. This situation is easily identified (only a single 
particle will have significant weight) and is analogous to us- 
ing a poor approximating distribution in standard impor- 
tance sampling. 



5 CSWA 31 

CSWA 31 is a complex lens with at least seven identifi- 
able images and a very large image separation (~ 12") (Fig- 
ure [31). The lens is a galaxy group, with a dominant central 
galaxy acting as the primary lens. We obtained g,r, and i- 
band images for CSWA31 on February 21, 2009, using the 
CMOS instrument on the Gemini South telescope. 1200 sec- 
ond dithered exposures were obtained in each of the bands, 
in seeing ranging from 0.7-0.75". The CMOS field of view is 
5.5 x 5.5 arcmin, and with 2x2 binning, the image scale is 
0.145" /pixel. In Figures 3-5, north is towards the bottom of 
the page. 

Due to the presence of mass substructure (as seen by 
the large number of yellow blobs in the image), we used 
the complex lens model (Section 12. 3[) for this system. The 
dominant yellow sources in the image were modelled as 11 
SIE lenses, with their positions, ellipticities and orientation 
angles almost-fixed at the observed values from the light (the 
prior allowed for a small uncertainty in each). We allowed the 
Einstein radii {b} to be free parameters with independent 
scale-free priors, truncated such that the primary lens must 
have the highest b. The central positions of the lens blobs 
are shown in Figure 2] 

Unfortunately the complex lens model creates large 
computational overheads: the time taken to simulate an im- 
age increases with the number of lens "blobs". Additional 
computational expense is caused by the fact that the pa- 
rameter space now has more dimensions, causing Diffusive 
Nested Sampling to require more steps per new level, and 
more levels to reach the bulk of the posterior. Finally, the 
large separation of the lens implies that the image simply 
has more pixels. To reduce the effect of these computational 
challenges, we analyse only a degraded resolution version of 
the CSWA 31 image, containing only the most significant im- 
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Figure 2. An illustration of a the distributions that must be sampled as Nested Sampling progresses {Brewer. Partav, k. Csanvill2010l y 
In the classic scheme, at Step 3, we must obtain a sample from the coloured region which is composed of two separate islands, which is 
usually very difficult if MCMC is the only exploration option. To overcome these difficulties, we explore the mixture distribution (bottom 
right), where travel between isolated modes is more likely. 
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Figure 3. SLUGS Gemini/GMOS Image of CSWA 31, with the primary lens galaxy subtracted (left). The right hand image shows the 
bluest image of CSWA 31 (where the redder lens substructures arc not visible), resampled onto a 64x64 pixel grid, for computational 
speed. Only the brightest lensed images were retained. Preliminary studies indicated that four of the images (the blue images in the 
modified image) are a classic quad configuration. For computational reasons we gave these images a common colour, to help our algorithm 
find the correct model more quickly. 



ages (Figure [3] shows the full and the degraded image). This 
means our final reconstruction will have only the brightest 
of the source components that are actually present. 

A further challenge is caused by the fact that it is not 
clear which sources are in the lens plane and which are back- 
ground sources, and even whether the background sources 
are all at the same redshift. For this initial study, we assume 
that all images in the right panel of Figure [3] are images of 
sources lying at a single redshift. 

The modelling was run with a maximum of five sources. 
A typical model from the run is shown in Figure [5] The 
source is comprised of five galaxies of extent ~ 2" , separated 



by ~ 5". The source (1) has been lensed into a classic quad 
configuration by the primary lens. Note that, according to 
our model, the leftmost image of (2) and the rightmost image 
of (3) are different sources and so may be different colours, 
whereas they appear to match in the image (Figure [3}. We 
attempted to find models where these two images are images 
of the same source, but we were unable to do so. The image 
in Figure [3] shows additional small green sources that may 
also lie scattered throughout the source group. 

Unfortunately the source redshifts are unknown so the 
physical scale is also unknown, however we are able to pro- 
vide an estimate. The redshift of the lens is 0.683. Assum- 
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Figure 4. Mean positions of the SIE lenses for the complex lens model for CSWA 31. The prior uncertainty on the positions is 1". 
The relative masses are left as free parameters, with broad priors. The eleven lens components chosen were the ones that appear most 
luminous and close to images, and therefore most likely to be significant lenses. Not all of the yellow components in Figure [3] were 
included for computational efficiency. 



ing a source redshift z s — 1 gives a source plane scale of 
8.3 kpc/", implying that these source galaxies are large spi- 
rals. Assuming z s = 0.8 gives 7.7 kpc/" and z s = 2 gives 
9.3 kpc/", so the physical interpretation is insensitive to the 
source redshift. The magnification of the source is 2.8 ± 0.1 
magnitudes, a factor of ~ 14. 

We measure the mass of the primary lens, within its 
critical curve, to be 1.4 ± 0.2 x 10 13 M Q . The other lenses 
masses are significantly uncertain, so we do not report them 
here. 



6 CONCLUSIONS 

In this paper we have detailed the lens modelling technique 
to be used for the complex gravitational lenses in the SLUGS 
(Sloan Lenses Unravelled by Gemini Studies) sample. The 
main features of our technique are the attempt to incorpo- 
rate realistic prior information about galaxy surface bright- 
ness profiles, and the Diffusive Nested Sampling method for 
exploring the posterior distribution for the reconstructions. 
We recommend drawing random samples from complex prior 
distributions as a visual check on the reasonableness of the 
probability assignments. Our random samples qualitatively 
resemble simple galaxy profiles, whereas the corresponding 
random samples from other methods, such as linear regu- 
larisation, are simply noise. Hence, our method uses much 
more prior information and can be expected to perform bet- 
ter than the alternatives when the data are noisy or low 
resolution. 



We presented the first reconstruction of the complex 
SLUGS lens CSWA 31. We found that the observed lensing 
configuration is best explained by five source components, 
lensed into the elaborate image structure by the complex 
group lens. Although the source redshift is unknown, it is 
plausible that the physical sizes of the sources are ~ 10 kpc 
and hence we are observing the merger of five large spiral 
galaxies. 
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Figure 5. A typical model from the posterior distribution for the CSWA 31 reconstruction. The source is composed of five components, 
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complex manner into three main images, and (3) is doubly imaged, 
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